Search CORE

5,660 research outputs found

$k$ -MLE: A fast algorithm for learning statistical mixture models

Author: Nielsen Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/03/2012
Field of study

We describe

k

-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering

k

-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of

k

-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of

k

-MLE can be implemented using any

k

-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of

k

-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize

k

-MLE, we propose

k

-MLE++, a careful initialization of

k

-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201

arXiv.org e-Print Archive

Crossref

Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means

Author: Nielsen Frank
Publication venue: 'Elsevier BV'
Publication date: 19/01/2014
Field of study

Bayesian classification labels observations based on given prior information, namely class-a priori and class-conditional probabilities. Bayes' risk is the minimum expected classification cost that is achieved by the Bayes' test, the optimal decision rule. When no cost incurs for correct classification and unit cost is charged for misclassification, Bayes' test reduces to the maximum a posteriori decision rule, and Bayes risk simplifies to Bayes' error, the probability of error. Since calculating this probability of error is often intractable, several techniques have been devised to bound it with closed-form formula, introducing thereby measures of similarity and divergence between distributions like the Bhattacharyya coefficient and its associated Bhattacharyya distance. The Bhattacharyya upper bound can further be tightened using the Chernoff information that relies on the notion of best error exponent. In this paper, we first express Bayes' risk using the total variation distance on scaled distributions. We then elucidate and extend the Bhattacharyya and the Chernoff upper bound mechanisms using generalized weighted means. We provide as a byproduct novel notions of statistical divergences and affinity coefficients. We illustrate our technique by deriving new upper bounds for the univariate Cauchy and the multivariate

t

-distributions, and show experimentally that those bounds are not too distant to the computationally intractable Bayes' error.Comment: 22 pages, include R code. To appear in Pattern Recognition Letter

arXiv.org e-Print Archive

On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means

Author: Nielsen Frank
Publication venue: 'MDPI AG'
Publication date: 10/12/2020
Field of study

The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution. However the Jensen-Shannon divergence between Gaussian distributions is not available in closed-form. To bypass this problem, we present a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using generalized statistical mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen-Shannon divergence between probability densities of the same exponential family, and (ii) the geometric JS-symmetrization of the reverse Kullback-Leibler divergence. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen-Shannon divergence between scale Cauchy distributions. We also define generalized Jensen-Shannon divergences between matrices (e.g., quantum Jensen-Shannon divergences) and consider clustering with respect to these novel Jensen-Shannon divergences.Comment: 30 page

arXiv.org e-Print Archive

Cramer-Rao Lower Bound and Information Geometry

Author: Nielsen Frank
Publication venue
Publication date: 23/01/2013
Field of study

This article focuses on an important piece of work of the world renowned Indian statistician, Calyampudi Radhakrishna Rao. In 1945, C. R. Rao (25 years old then) published a pathbreaking paper, which had a profound impact on subsequent statistical research.Comment: To appear in Connected at Infinity II: On the work of Indian mathematicians (R. Bhatia and C.S. Rajan, Eds.), special volume of Texts and Readings In Mathematics (TRIM), Hindustan Book Agency, 201

arXiv.org e-Print Archive

Derivatives of Multilinear Functions of Matrices

Author: Bhatia Rajendra
Nielsen Frank
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Perturbation or error bounds of functions have been of great interest for a long time. If the functions are differentiable, then the mean value theorem and Taylor's theorem come handy for this purpose. While the former is useful in estimating

\|f(A+X)-f(A)\|

in terms of

\|X\|

and requires the norms of the first derivative of the function, the latter is useful in computing higher order perturbation bounds and needs norms of the higher order derivatives of the function. In the study of matrices, determinant is an important function. Other scalar valued functions like eigenvalues and coefficients of characteristic polynomial are also well studied. Another interesting function of this category is the permanent, which is an analogue of the determinant in matrix theory. More generally, there are operator valued functions like tensor powers, antisymmetric tensor powers and symmetric tensor powers which have gained importance in the past. In this article, we give a survey of the recent work on the higher order derivatives of these functions and their norms. Using Taylor's theorem, higher order perturbation bounds are obtained. Some of these results are very recent and their detailed proofs will appear elsewhere.Comment: 17 page

arXiv.org e-Print Archive

CERN Document Server